Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We study the prediction problem in the context of the high-dimensional linear regression model. We focus on the practically relevant framework where a fraction of the linear measurements is corrupted while the columns of the design matrix can be moderately correlated. Our findings suggest that for most sparse signals, the Lasso estimator admits strong performance guarantees under more easily verifiable and less stringent assumptions on the design matrix compared to much of the existing literature.more » « lessFree, publicly-accessible full text available August 1, 2026
-
The topic of robustness is experiencing a resurgence of interest in the statistical and machine learning communities. In particular, robust algorithms making use of the so-called median of means estimator were shown to satisfy strong performance guarantees for many problems, including estimation of the mean, covariance structure as well as linear regression. In this work, we propose an extension of the median of means principle to the Bayesian framework, leading to the notion of the robust posterior distribution. In particular, we (a) quantify robustness of this posterior to outliers, (b) show that it satisfies a version of the Bernstein-von Mises theorem that connects Bayesian credible sets to the traditional confidence intervals, and (c) demonstrate that our approach performs well in applications.more » « lessFree, publicly-accessible full text available April 1, 2026
-
This paper investigates asymptotic properties of algorithms that can be viewed as robust analogues of the classical empirical risk minimization. These strategies are based on replacing the usual empirical average by a robust proxy of the mean, such as a variant of the median of means estimator. It is well known by now that the excess risk of resulting estimators often converges to zero at optimal rates under much weaker assumptions than those required by their classical counterparts. However, less is known about the asymptotic properties of the estimators themselves, for instance, whether robust analogues of the maximum likelihood estimators are asymptotically efficient. We make a step towards answering these questions and show that for a wide class of parametric problems, minimizers of the appropriately defined robust proxy of the risk converge to the minimizers of the true risk at the same rate, and often have the same asymptotic variance, as the estimators obtained by minimizing the usual empirical risk. Finally, we discuss the computational aspects of the problem and demonstrate the numerical performance of the methods under consideration in numerical experiments.more » « lessFree, publicly-accessible full text available March 1, 2026
-
Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with the name of John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics, graph theory, and the study of elections and social choice. We present improved performance guarantees for empirical Tukey's median, a deepest point associated with the given sample, when the data-generating distribution is elliptically symmetric and possibly anisotropic. Some of our results remain valid in the wider class of affine equivariant estimators. As a corollary of our bounds, we show that the typical diameter of the set of all empirical Tukey's medians scales like $$o(n^{-1/2})$$ where $$n$$ is the sample size. Moreover, when the data are 2-dimensional, we prove that with high probability, the diameter is of order $$O(n^{-3/4}\log^{3/2}(n))$$.more » « lessFree, publicly-accessible full text available November 25, 2025
-
We consider the high-dimensional linear regression model and assume that a fraction of the measurements are altered by an adversary with complete knowledge of the data and the underlying distribution. We are interested in a scenario where dense additive noise is heavy-tailed while the measurement vectors follow a sub-Gaussian distribution. Within this framework, we establish minimax lower bounds for the performance of an arbitrary estimator that depend on the the fraction of corrupted observations as well as the tail behavior of the additive noise. Moreover, we design a modification of the so-called Square-Root Slope estimator with several desirable features: (a) it is provably robust to adversarial contamination, and satisfies performance guarantees in the form of sub-Gaussian deviation inequalities that match the lower error bounds, up to logarithmic factors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the additive noise, and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze performance of the proposed estimator, we prove several properties of matrices with sub-Gaussian rows that may be of independent interest.more » « less
-
We prove Fuk-Nagaev and Rosenthal-type inequalities for the sums of indepen- dent random matrices, focusing on the situation when the norms of the matrices possess finite moments of only low orders. Our bounds depend on the “intrinsic” dimensional char- acteristics such as the effective rank, as opposed to the dimension of the ambient space. We illustrate the advantages of such results in several applications, including new moment inequalities for the sample covariance operators of heavy-tailed distributions. Moreover, we demonstrate that our techniques yield sharpened versions of the moment inequalities for empirical processes.more » « less
-
This paper is devoted to the statistical properties of the geometric median, a robust measure of centrality for multivariate data, as well as its applications to the problem of mean estimation via the median of means principle. Our main theoretical results include (a) the upper bound for the distance between the mean and the median for general absolutely continuous distributions in $$\mathbb R^d$$, and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension $$d$$; (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce the improved bounds for the multivariate median of means estimator that hold for large classes of heavy-tailed distributions.more » « less
-
The goal of this note is to present a modification of the popular median of means estimator that achieves sub-Gaussian deviation bounds with nearly optimal constants under minimal assumptions on the underlying distribution. We build on the recent work on the topic and prove that desired guarantees can be attained under weaker requirements.more » « less
-
This paper addresses the following question: given a sample of i.i.d. random variables with finite variance, can one construct an estimator of the unknown mean that performs nearly as well as if the data were normally distributed? One of the most popular examples achieving this goal is the median of means estimator. However, it is inefficient in a sense that the constants in the resulting bounds are suboptimal. We show that a permutation-invariant modification of the median of means estimator admits deviation guarantees that are sharp up to $1+o(1)$ factor if the underlying distribution possesses more than $$\frac{3+\sqrt{5}}{2}\approx 2.62$$ moments and is absolutely continuous with respect to the Lebesgue measure. This result yields potential improvements for a variety of algorithms that rely on the median of means estimator as a building block. At the core of our argument is are the new deviation inequalities for the U-statistics of order that is allowed to grow with the sample size, a result that could be of independent interest.more » « less
An official website of the United States government

Full Text Available